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A new  class  of  random  search  algorithms  for 
stochastic  optimization  is  presented  .The  designer  has 
the  option  to  employ  a learning  memory  in  order  to 
reduce  the  cost  of  the  optimization  process  measured 
in  terms  of  the  number  of  observations  .The  asympto- 
tical properties  of  the  procedure  are  discussed, and 
new  probability  theoretical  techniques  are  used  In  the 
proof  of  convergence. 

I.  Introduction 

Let  Q be  an  unknown  real-valued  function  on  a 
set  BcRm  where  mzl.ln  many  applications  .one  is  in- 
terested in  finding  a w in  B for  which  Q(w)  is  nearly 
minimal . Because  of  the  absence  of  any  information 
regarding  the  continuity  .differentiability , smoothness 
or  unimodality  of  Q.or  because  of  the  special  nature 
of  B (for  example, B can  be  a countable  set  of  isola- 
ted points  from  Rm),lt  is  not  possible  to  use  a 
classical  optimization  technique  such  as  the  gradient 
method.lt  is  known  that  in  such  situations  random 
search  can  be  successfully  used  (for  a review  of  the 
literature, see  [1,2,3]). 

In  this  paper, we  are  Interested  in  the  stochas- 
tic optimization  problem, that  ls,Q(w)  is  no  longer 
exactly  computable  but  can  be  estimated  if  enough 
observations  are  averaged. To  be  explicit, it  is  assu- 
med that  for  all  w^B.one  can  observe  (compute, etc.) 

Y ,Y  ....where  the  Y are  Independent  random 

I n n 

variables  all  distributed  as  Y with  distribution 

function  Fw  and  mean  Q(w)=J ydF^fy) »Ew(Y]  . 

Several  people  have  tried  the  random  search  algo- 
rithms used  in  deterministic  optimization  with  the  re- 
sult that  there  are  as  many  heuristic  random  search 
methods  as  there  are  scientists  studying  the  stochas- 
tic optimization  problem. 

The  most  widely  studied  random  search  techni- 
que for  stochastic  optimization  is  the  algorithm  of 
Gurln  [4]  or  one  of  its  modifications  [3 , 5] , Gurln's 
algorithm  is  simple  and  can  be  used  for  general  B 
and  Q. However, the  task  of  proving  the  convergence 
for  the  modified  methods  has  become  increasingly 
difficult. Furthermore  .Gurln's  method  is  very  inefficient 
with  respect  to  the  number  of  measurements  (obser- 
vations). If  B is  a finite  set  of  points, one  can  use 
stochastic  automata  with  a variable  structure  [6,7]or 
probabilistic  strategy  selection  methods  [8], most  of 
which  are  proved  to  be  convergent  in  some  probabi- 
listic sense. If  Q satisfies  some  regularity  conditions, 
usually  in  terms  of  continuity, differentiability  and 
unimodality, local  hill-climbing  methods  may  be  used. 
Most  of  these  techniques  are  derived  from  the  Kiefer- 
Wolfowltz  stochastic  approximation  algorithm  [10,11], 
the  stochastic  gradient  algorithm  [12-15]  and  combi- 
nations of  these  algorithms  with  stochastic  automata 
and  random  search  [16-17]. For  Instance, if  Q is  con- 
tinuous and  if  the  accuracy  of  the  solution  is  of  no 
great  importance, one  can  always  partition  B Into  a 
finite  number  of  sets  and  consider  each  set  as  a 
single  point  in  a new  space,  t*"  js  reducing  the  prob- 
lem to  a finite  optimization  problem  (see  [9]). 

The  classical  random  search  algorithm  is  a se- 
quential procedure  to  update  the  best  estimate  of  the 


minimum  in  which  in  the  search  for  a new  best  esti- 
mate, only  the  very  recent  history  of  the  search  is  ta- 
ken into  account. This  algorithm  thus  operates  with  a 
short  memory. However, over  the  last  five  years  two 
factors  in  the  design  of  optimization  systems  have 
changed. First, the  computers  have  become  very  fast 
and  can  handle  very  large  active  memories. On  the 
other  hand, the  cost  of  taking  measurements  (i.e.  col- 
lecting data, evaluating  a performance, etc.)  has  gone 
up  considerably  because  of  the  increased  cost  of  man- 
power. This  has  made  the  cost  of  the  storage  and  pro- 
cessing of  data  decrease  relatively  to  the  cost  of  ob- 
taining the  data. This  trend  has  been  recognized  by 
several  authors  (e . g . [18])  .So, one  wants  to  develop 
an  algorithm  which 

(i)uses  the  available  information  as  well  as  pos- 
sible, e.g. by  storing  the  past  observations  and  pro- 
cessing the  data  obtained  during  the  search  in  an 
intelligent  way. 

(il)guarantees  that  the  best  estimate  of  the  minimum 
converges, in  some  probabilistic  sense, to  the  mini- 
mum of  Q. 

In  this  paper  a statistical  search  method  is 
developed  with  a potentially  growing  memory. The  rate 
of  convergence  to  the  minimum  is  expected  to  be  high 
due  to  the  learning  behavior  of  the  memory. Maclaren 
[19]  proposed, in  a control  engineering  application, to 
use  a stochastic  automaton  with  a variable  structure 
and  a growing  number  of  states  to  tackle  a special 
stochastic  optimization  problem. However, the  conver- 
gence problem  for  his  method  is  not  satisfactorily 
solved  while  the  field  of  applications  is  very  small. 
Our  approach  does  not  resemble  any  other  method 
available  in  the  literature  and  is  partially  modeled 
on  the  learning  process  in  the  human  brain. "Remem- 
bering exceptional  facts"  , "forgetting  the  too  distant 
past"  and  "averaging  costs"  are  features  that  can  be 
recognized  in  the  algorithm. The  theoretical  value  of 
the  method  is  that  it  encompasses  the  well-known 
random  optimization  method  of  Matyas  [20]  for  deter- 
ministic optimization  as  a special  case. The  emphasis 
is  on  the  new  method  for  proving  the  convergence  of 
the  algorithm  in  stochastic  optimization  problems. The 
techniques  .different  from  those  employed  in  [4, 5], de- 
pend upon  some  powerful  probability  theoretical  in- 
equalities [22], 


II .Problem  Formulation 
Let  (J2,  C , P)  be  a probability  space  and  let 
B be  a closed  set  from  Rm.Let  8™  be  the  o -algebra 


of  all  the  Borel  sets  that  are  contained  in  B.We  as- 
sume that  there  exists  a measurable  mapping  h from 
(/IxB.GxB !?)  to  (R,8)  where  B is  the  class  of  Borel 

sets  from  R. Notice  that  for  every  w in  R , Y=h(u>,"* 
is  a random  variable  on  (/l,G,P).We  say  that  a col- 
lection 

e - (FjwfB)  (1) 


of  distribution  functions  is  a random  environment 
with  search  domain  B if  B is  a closed  set  from  R 
and  if  there  exists  a probability  space  (A.G.P)  and 

an  (0xB,Gx8n)-(R,B)  measurable  function  h such 
B 
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that  for  alt  y«F,  F^(y)-P{ u>  | tiieiT.,h(a>,w)*y]  .Notice 


that  If  B is  countable , then  such  a probability  space 
and  a measurable  function  h can  always  be  found. 

Thus  It  makes  sense  to  define  a countably  Infinite 
(finite)  random  environment  as  a countable  (finite)  col- 
lection of  distribution  functions. The  reason  of  the  de- 
finition (1)  is  the  following. If  W is  any  random  vector 
on  some  yobablllty  space  (fl1  ,G',P')  that  Is  different 
from  (/1,G,P)  and  W takes  values  In  B.then  Y=h(w,W) 
Is  a random  variable  on  the  product  of  both  probabi- 
lity spaces. Furthermore, If  W, W Is  any  se- 

l n 

quence  of  random  vectors  (all  taking  values  In  B) 

that  are  applied  to  the  environment,  then  there  exists 

a sequence  Y, , . . . ,Y  of  random  variables  (referred 
l n 


to  as  responses  cf  the  environment. measurements. ob- 
servations or  observed  losses  ) where, given  that 

Wn=wn  (WjgB,i=l , . . . ,n) , the  Y are  In- 

1 


Wl=w! 


dependent  and  have  distribution  function  F ,1 
n.  wl 


We  will  refer  to  0(w)=J'ydFw(y)  - EwfY)  *•  the 

stochastic  performance  Index  .Q  Is  by  assumption  a 
Borel  measurable  function  from  B to  F.  Except  for  B, 
we  assume  that  there  Is  no  a priori  knowledge  about 
t or  Q.The  stochastic  optimization  problem  is  to 
sequentially  find  a value  w In  B for  which  Q Is  mini- 
mal or  nearly  minimal. 

We  assume  that  there  Is  a random  generator 
with  support  In  B.i.e.a  device  for  generating  a se- 
quence W W ,...  of  lid  (Independent  Identical- 
■i  n 

ly  distributed)  random  vectors  taking  values  in  B and 
distributed  as  W where  W has  a distribution  function 
G which  Is  either  known  or  unknown. The  minimum  of 
Q with  respect  to  G Is 


qmln  » ess  Inf  Q(W)  (2) 

where  the  essential  lnflmum  is  defined  as  usual  [2], 
Actually, qmln  is  the  unique  number  with  the  property 

that  for  all  «>0,  P{0(W)<qmln-»  }«0  and 

P{Q(W)sq  +*]>0  provided  that  q , >-»  .We  remark 
min  min 

that  if  B Is  countable,  say  B*{  w^  ,w2 , . . . ) ,.v>d  G 

puts  mass  g^  at  w^  such  that  Eg^-l  and  Osg^sl  for 

all  l,then  q , = Inf  Q(w,)  .In  this  case  we  see 
nun  . n l 
l:9j>0 

that  qmln  Is  Independent  of  G as  long  as  every  w^ 

receives  positive  probability  from  G. 

We  categorize  the  random  environments  with 
search  domain  B as  follows. 

(I)  e Is  6 (deterministic, noiseless)  If  for  all  w(B, 
Y=Q(w)  wpl  (with  probability  one). 

(II)  C la  1 1 for  some  t>0  with  parameter  L<*  if 

»up  E { I V-Q(w)  | l]“Sup  f I y-Q(w)  |*dF  (y) 
wf B w wg  B 

ll<«  (3) 

(III)  B is  j (generalized  gausslan)  with  parameters  a 
and  L (0so<»  ,0sL<«)  If 

supE  {eXCY-Q(w»)  <#e2X2/2(l-|x|L> 
wf  B W 

for  all  \ with  |\(L<1. 

If  an  environment  is  J then  It  certainty  Is  e.  for  all 
t>0.A  deterministic  environment  Is  always  gausslan 


and  If  e Is  et  for  some  t>0  with  parameter  L=0,then' 
l Is  deterministic. Also, If  l is  then  e la  t%  for 

all  sst.lt  should  be  pointed  out  that  most  environ- 
ments of  any  practical  interest  are  generalized  gaus- 
slan. For  ^Instance,  If  all  the  Fw  are  gausslan  with  va- 
riance <j  (w)s0  .then  t is  jwwlth  parameters  o and 
O.If  to  all  correspond  probability  measures  that 

put  their  weights  on  [dj(w)  ,d2<w)]  where  d2(w)-d  (w) 

<L , then  t Is  generalized  gausslan  with  parameters 
L/2  and  L.In  particular, If  for  all  w In  B,Y  takes  with 
probability  one  values  In  {0,1]  or  [0,1], then  t is 
J with  parameters  1/2  and  l.Such  environments  are 
often  encountered  in  stochastic  automata  theory  and 
discrete  optimization. 

The  purpose  is  to  find  an  optimization  pro- 
cedure that  generates  a sequence  of  random  vectors 
Wj Wn,...  taking  values  In  B such  that 

Max(Q(W  ),q  ) tends  In  some  probabilistic  sense 

n nun 

to  Qmln  as  n~. Notice  that  we  have  to  allow  for  the 

possibility  that  Q(W  )<q  for  some  n.Of  course, If 
n nun 

Wj , . . . ,W  , . . . were  a sequence  of  lid  random  vec- 
tors distributed  as  W.then  Q(W  )»q  wpl  for  all  n. 


III . The  Optimization  Procedure 
Let  [®n]  and  [5n)  be  sequences  from  [0,1] 

wl*h  • *1  for  all  n,and  let  [X  ] be  a sequence 

n n n 

of  positive  Integers. Further, let  Zj.Zj,...  be  a se- 
quence of  Independent  Integer-valued  random  variables 
with 


P{Z  -1]«*  ,P(Z  ,P{Z  —1  }-!-•  -§  . 

n n n n * n ' nn 

(4) 

To  start  the  search, generate  a random  vector  W*-W 

0 0 

having  distribution  function  G. Given  that  W*»w,  » 

0 0 

measurements  are  made, say  Y{ Y'  .all  having 

1 Xg 

distribution  function  F .Let  the  estimate  of  Q(w)  be 
Y*  where  Xn 

YJ-  ( E Y/)/X0.  (S) 

1®  1 , 

Let  Yq»Y*  and  N0»N»«x0< where  NQ  is  the  number  of 


observations  that  were  used  In  computing  the  average 

Y .The  search  procedure  consists  of  generating  two 

sequences  of  triples, (W*  ,Y*  ,N*)  and  (W  ,Y  ,N  ),n- 
n n n n n n 

1,2....  where  fW»  ,Y*  ,N*)-(W0,Y0,N0)  .Wn  is  the 

estimate  at  Iteration  n of  the  minimum  In  Fm  of  Q. 

Y is  the  corresponding  estimate  of  Q(W  ) and  N is 
the  experience  with  W .that  is, the  numtftr  of  obser- 
vations that  were  uset/1  In  the  computation  of  the  es- 
timate Y . 

n 

Let  the  search  be  at  Iteration  n.Then  W*  is 
generated  as  follows.  n 

(1)  If  Z =0, let  W»-W  .. 
n n n-i 


(li)  If  Z «l,let  W*  be  an  independent  random  vector 
n n 

with  distribution  function  G. 

(Ill)  If  Z »-l,W*  is  arbitrary  with  the  restriction 
n n 

that  P{W*«B)-1. 


H 


705 
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Given  that  W*«w,  N*=x.  observations  are  made, say 
n n n 

Y',...,Y'  .all  having  distribution  (unction  r .Let  the 
estimate  of  Q(w)  be 

Y*-(rnYJ)An.  (6) 

l-l . 1 

Two  questions  naturally  arlse,-(l)  How  does  one  pick 

W*  If  Z --1  and  how  can  the  past  observations  be 
n n 

used  to  aid  In  picking  W*  ? (Z)  How  does  one  find 
n 

(W.Y.N)  given  (W*.Y*.N*)  .1-0. 1 n and 

n n n n n n 

(W  ,Y  ,N  .)?  We  will  refer  to  W as  the 
n-i  iw  n-i  n 

basepolnt  and  to  (W  ,Y  ,N  ) as  the  base  triple, 
n n n 

In  Gurln's  algorithm  [3-5], to  make  the  decision 
In  answer  to  question  (2), it  Is  required  that  X ad- 
ditional measurements  be  made  with  W to  &>taln 

an  estimate  7 , of  Q(W  ,).The  deciiVon  Is  based 
n-i  iw 

upon  a comparison  between  7 , and  Y*  .that  Is. 

n-1  n 

W =W  , unless  Y ,>Y*+e  (where  s >0  Is  a thres- 
n n't  n-i  n n n 

hold), in  which  case  W »W* .However, valuable  data 
n n 

are  wasted  since  Y la  forgotten  and  thus.lt  Is  as 

If  N , measurements  are  thrown  away  at  the  n-th 
n-x 

Iteration. Therefore, we  will  not  require  to  make  spe- 
cial additional  observations  for  the  decision  (2), thus 
reducing  the  total  cost  of  data  collection. 

Let  Hr  be  the  data, outside  the  base  triple, that 

are  memorized  at  time  n where  H «(W,n,Y,n,N")..... 

n n n n 1 1 1 

(W  ,Y_  ,N"  ) and  where  T Is  a nonnegative  Integer 
i i i n 

n n n 

valued  random  variable. If  T =0,then  H Is  empty. If 

T *M<»  for  all  n,we  say  &at  the  algorithm  operates 

with  a finite  memory.  If  T as  n-*-  then  we  say 

n 

that  the  algorithm  operates  with  a growing  memory. 

We  require  that  H be  a measurable  function  of 
n 

the  (W*,YJ,NJ)  ,1-0,1 n and  that  at  all  times 

Wn,Wj,...,W™  are  pairwise  unequal. Therefore, TQ-0 
and  T *n  for  a/}  n. 

V.  now  continue  the  description  of  the  algo- 
rithm. First  of  all.it  Is  clear  that  in  picking  W*  if 
Zn»-l,we  can  expect  help  from  Hn  j and  (W^™. 

Y , ,N  .) .Given  fW*,Y*  N*),(W  . ,Y  ..N  .)  and 

n-i  n-i  n,  n n n-i  n-i  n-i 

H ,,we  will  compute  (W  ,Y  ,N  ) In  two  steps, 
n- i n n n 

First  an  auxiliary  triple  (W*.Y*,fi*)  Is  obtained. De- 

n n n 

fine  a random  variable  S where 
n 


1 If  w*-w 


Iv*  “ w;-wf^ 

's  -3  otherwise 
n 


lor  some  wf"*from  H , 
i n-i 


Note  that  S is  uniquely  defined  since  W ,,Wn7i., 
n-1  n n“*  1 

WT  are  pairwise  unequal. Define  further  the  mer- 
ging of  two  triples, (W.Y.N)  and  (W,Y",N"),as  fol- 
lows: 

(W , Y , N)*  (W . Y"  , N •’)- (W , (NY+N  “Y")/(N+N  ") , N+N  ^ 

») 

Thus, the  experience  of  the  new  triple  Is  the  sum  of 
the  experiences  of  the  component  triples. 


Define  vW*,Y£,R*)  by 

" ” (w  ,.y  ,.n  Kits  -1 

n-1  n-1  n-1  n 

(W*,Y*,N»)  or  (W*,Y*,N»)* 
n n n . . n n n 

If  S =2  and  W*«Wn_1 
n n 1 

(W*.Y*  NM  ,lf  S -3  (9) 

n n n n 

where, If  S^=  1 , one  either  always  merges  or  never  mer- 
ges and, if  S =2, one  either  always  merges  or  never  mer- 
ges. The  merging  operation  can  be  randomized  but  this 
will  only  complicate  matters  now. The  consistency  In  the 
use  of  the  merging  operation  and  the  fact  that  W . and 
n-1  n-1 

the  Wj  ,l*l*Tn_j,are  pairwise  unequal  for  all  n are 

important  factors  In  the  proof  of  the  theorem  of  con- 
vergence given  below. 

The  next  step  is  the  decision^  whether  to  pick 
(Wn-l'Yn-l'Nn-l)  * to  #elect  ^.Y^N*)  as  the 
new  base  triple. Let  D be  a random  variable  taking 
values  in  [0,1]  where  D -1  only  if  the  old  base 
triple  Is  updated  at  the  n^h  Iteration. Thus , 

( 1 If  S -1  or  Y*<Y  , 

Dn”{  n n n-1  (10) 

(0  otherwise 

•'.Wr-’S-*?  1,En"1(ll) 

The  only  thing  that  Is  left  Is  to  obtain  H 

from  H . and  (W*  Y*,N*),To  make  sure  that  Wnand 
n-i  n n n n 

Wj,l*l*Tn,  are  pairwise  unequal, the  following  pro- 
cedure Is  suggested.  , 

(I)  If  S -2  and  W*-W*"\  remove  (W’'\VJ'1,jO  from 

u **  n i ill 

Hn-1* 

(II)  If  D-l  and  S/l , add  (W  ,.Y  N ,)  to  H , 

n n n-1  n-1  n-1  n-1 

or  add  nothing  at  all. If  D =0,add  (W*,Y*,N*)  to 

n n n n 

H . or  add  nothing  at  all. 
n-i 

(ill)  Any  triple  left  In  H after  (11)  can  be  dropped 
If  desired. Dropping  n triples  corresponds  to  a 
loss  of  memory  but  can  sometimes  be  more  econo- 
mical. 

(Iv)  Relabel  all  the  triples  left  after  (111)  so  that  to 
all  lsl«Tn  (Tn  Is  the  number  of  triples)  there  cor- 
responds one  and  only  one  triple  (W^,^,N^).Thls 

relabeled  sequence  of  triples  Is  H . 

The  method  of  deciding  whether  to  add  or  to 

drop  triples  from  H j (In  (11), (111))  Is  not  specified. 

In  fact  this  decision  may  depend  In  an  arbitrary 
fashion  upon  any  Information  available  at  the  n-th 
iteration. The  decision  may  be  randomized  and  can, In 
an  extreme  case, also  be  made  through  human  Inter- 
vention In  the  search  process. 

Given  (W  ,Y  ,N  ) and  H ,the  above  described 
n n n n 

procedure  Is  repeated  for  n+ 1 , that  is, the  generation 

of  Wl.  «)  (***  (6)), the  computation  of 

n-f  l n-f  i in-  l 

(W*  ,Y*  ,N*  ) (see  (7), (9)), the  decision  concerning 
nf  i n-f  i n-f  l 

(W  Y N .)  (see  (10-11))  and  the  determination 
nf  1 nf  I nf  i 
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of  H , (procedure  (l)-(lv)). 

Remarks  : We  note  that  the  algorithm  can  be  used 

with  T =0  for  all  n. Notice  further  that  T =0  and  that 
n 0 

0*T  ,‘T  +1  for  all  n.The  memory  can  be  labeled  as 

n+ 1 n 

a learning  memory  either  because  the  N*  are  increa- 
sing (in  view  of  X as  n-»®  ,or  because  of  the 
merging  in  (9))  or  because  T as  n-*®  .Surprising- 
ly, the  convergence  of  the  algorithm  is  not  affected 

by  the  finiteness  or  divergence  of  the  sequence  T . 

n 

The  undefined  parts  in  the  algorithm  are 

(a)  The  generation  of  W*  if  Z =-l, 

n n 

(b)  Steps  (11)  and  (ill)  in  the  updating  of  H , . 

n-l 

It  is  up  to  the  designer  to  use  (a)  and  (b)  to  obtain 
high  rates  of  convergence. Of  course, some  experimen- 
tal know-how  will  be  helpful. Let  us  briefly  discuss 
the  problems  (a)  and  (b)  .We  say  that  Q and  B define 
an  exhaustive  search  problem  if  for  every  finite  pub- 
set  [w.  , . . . ,w  1 of  B,the  knowledge  of  Q(w  ),..., 

1 L ' 1 

Q(w^)  does  not  convey  any  information  regarding  the 

value  of  Q(w)  for  any  wgB,w4[Wj , . . . .w^}  .In  such 
problems  .does  it  still  make  sense  to  store  some  in- 
formation in  H (i.e.,to  let  T >0)  ? The  answer  is 
of  course  negative  if  the  environment  is  deterministic. 
7ndeed,lf  the  environment  is  £,then  it  is  clear  that 

•'  -Q(W  ) wpl  for  all  n.The  only  information  that 
n 

needs  to  be  stored  is  (W  ,Y  ) and  it  is  not  neces- 
n n 

sary  to  sample  the  basepolnt  (thus .let  0 >0  for  all  ri 

Further, if  Z =-l,the  best  one  can  do  is  "to  generate 

W*  with  distribution  function  G in  B. Therefore, if  the 
n 

environment  is  deterministic  and  defines  an  exhaus- 
tive search  problem, we  can  let  a =1  and  Z =1  for  all 
n.In  the  random  search  literature .^hls  method  is  cal- 
led blind  search  [1]  Assume  next  that  the  environment 
is  deterministic  but  that  Q and  B do  not  define  an 
exhaustive  search  problem, e.g.  because  B=Fm  and  Q 
is  known  to  be  continuous. In  that  case  it  can  be 

helpful  to  let  T > 0 . If  E =*-l,T  =0  and  Q is  conti- 

n n n-i  . 

nuous  ,one  can  let  W*  be  gausslan  with  variance  a 
n n 

and  mean  W , for  the  purpose  of  local  hill -climbing 
n-i 

(this  method  is  referred  to  as  creeping  random  search 

[l]).If  T ,>0,the  distribution  of  W*  may  be  a mix- 
n-i  n 

ture  of  gausslan  distributions  with  centers  at  W 

n— 1 n— 1 

and  w"  ,lsisT  , , in  order  to  simultaneously 
1 n-l 

climb  separate  lopal  hills  .The  same  strategies  can 
be  used  if  the  environment  is  noisy, l.e. not  0. But  for 
noisy  environments, even  in  case  Q and  B define  an 
exhaustive  search  problem.lt  makes  sense  to  store 
all  the  past  observations  in  H on  account  of  the  fact 

that  the  yJ1  1 are  only  noisy  estimates  of  the  Qfwj*) 

for  i»l T , .In  such  case, if  Z =-l,one  could  for 

n-l  n 

instance  define  W*  as  follows. Let  M>0  be  fixed  and 

consider  those  " Wj1-1  that  correspond  to  the  M 

lowest  values  among  the  Yi^.lsi^T  ..Then  let 

1 n- 1 

W*  have  a uniform  distribution  over  those  w""  .The 
n 1 

designer  can  for  Instance  eliminate  the  other  (W^  , 

from  Hn_j  so  that  Tn«M  for  all  n. 


A note  is  in  order  here  concerning  the  merging 

operation  in  (9). If  T , is  large  and  one  goes  through 
n- 1 

the  trouble  of  storing  all  or  most  of  the  past  observa- 
tions, it  would  be  very  inconsistent  if  no  merging  was 
used  in  (9) . Further , if  merging  is  used  in  (91, it  is 

wise  to  let  W*  be  equal  to  one  of  the  w"~  , l*i*T  , 
n i n-l' 

with  positive  probability, thus  increasing  the  expe- 
riences of  the  W"'1  on  the  long  run. If  T is  small 
1 n 

or  zero, one  can  of  course  as  well  do  without  the  mer- 
ging in  (9). This  would  simplify  the  algorithm  consi- 
derably because  (W* , Y*  ,N*)=(W*,  Y*,N*)  and  N*=X  .If 
nnn  nnn  nn 

T “0  for  all  n.then  it  is  easy  to  see  that  the  only 
n-i 

thing  to  be  memorized  is  (W  ,Y  ).The  decision 
rule  (10-11)  reduces  to  n n 

l(W*,Y*)  if  Y*  < Y , 

Vn"l(W  , 


n ' n t (W  . , Y ) otherwise . ' 

n-l  n-l 

In  that  case  the  algorithm  reduces  to  the  well-known 
random  optimization  algorithm  of  Matyas  [20]. 

IV.  Theorem  Of  Convergence 
Theorem  1 : Let  B be  a closed  set  from  Fm  and  let  t 
be  a random  environment  with  search  domain  B.Let  Q 
be  a Borel  measurable  mapping  from  B to  F and  let  G 
be  an  arbitrary  distribution  function  with  support  in  B. 

Let  [°  },{0  } and  [y  ] be  number  sequences  from 
n n n 

[0,1]  such  that  ° +0  +y  =1  for  all  n.Let  [x  ) be  a 
nnn  n 

sequence  of  positive  integers  and  let  Wj.W^,...  be 

a sequence  of  random  vectors  from  Rm  whose  distri- 
bution is  determined  by  the  procedure  described  in 
section  III. If  there  exists  a sequence  [b  J of  Integers 
such  that  " 

(13) 

0*b  <n  for  all  n,  (14) 

n 

n n 

Z 0,  " - . (15) 

l“b  • 
n n n 

Z « -*  - . (16) 


and  the  environment  is 

either  4 _ „ 

or  J and  x,  /cl  log  n -*  • 

— bn  n 

or_Ct  for  t>2  and  in  addition  to  the  latter  con- 
dition, net /I*  1 2 • 

" “n  a ( 1 7) 

where  c »n-b  +1  and  =Mln(x.  ,X.  ,,...,x  ),then 

n n b^  b„  bn+l  n 

Max(Q(W),q  , ) -*  q in  probability.  (18) 
n min  min 

The  convergence  in  (18)  is  with  probability  one  if  the 
conditions  (15-17)  are  replaced  by  (15’-17‘): 

E 0.  / log  n " • , (15') 

n" 

Z or.  / log  n 2 - , (16  ) 

1«V 

and  the  environment  is  either  i or,  J and 

X.  /c^  log  n " »,or  C for  ta2  and  in  addition  to  the 
b n m t 

latter  condition,  z nclA[f  < •.  (17') 

n»l,  n 

Proof:  Theorem  1 is  proved  in  the  Appendix. 
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In  some  applications  one  Is  more  Interested  In 
the  asymptotic  behavior  of  the  expected  values  of  the 

measurements, l.e.  Q(W*),n=l,2 The  following 

theorem  holds  true. 

Theorem  2 : Let  B be  a closed  set  from  Rm  and  let  f 
be  a random  environment  with  search  domain  B.Let  Q 
be  a Borel  measurable  mapping  from  B to  F and  let  G 
be  an  arbitrary  distribution  function  with  support  in  B. 

Let  1°  ],[S  } and  ty  } be  number  sequences  from 
n n n 

[0,1]  such  that  ® +F  +y  = 1 for  all  n.Let  [\  ) be  a 
n n n n 

sequence  of  positive  Integers  and  let  W£.Wj,W*,W2< 

...  be  a sequence  of  random  vectors  from  Rm  whose 
distribution  is  determined  by  the  procedure  described 
In  section  III.  If 

Max(Q(W  ),q  ) -»  q in  probability  (19) 

" *'  min 


Max(Q(Wn).qmin) 
®n5  »• 

Max(Q(W*)  ,q  ) ! 
n min 


q , In  probability, 
min 


Proof : Let  <>0  be  arbitrary  and  note  that 

<V0)  •«>  thttl 

PfQ^n)>W*1‘PtQ(Wn-l)><Imln**I+(1-»n,-n‘*or*n' 

2 follows  from  this  inequality , (19)  and  (20). 

Remark  :It  turns  out  that  the  convergence  In  probabi- 
lity of  Q(W*),as  in  (21)  .normally  is  the  strongest 
n 

possible  mode  of  convergence. Indeed. If  y -0  for  all 

n,it  is  not  always  possible  to  insure  the/1  Q(W*) 

converges  wpl.Thls  curious  but  :aot  entirely  surprising 

result  is  formulated  in  Theorem  3. The  counterexample 

proving  Theorem  3 is  given  In  the  Appendix. The 

result  in  Theorem  3 is  not  aLsoiute  in  the  sense  that 

for  special  B and  £ It  may  be  possible  that  Q(W  ) 

and  Q(W»)  both  tend  to  q , — p\  as  n-»».  ' 

n min 

Theorem  3 : There  exists  a closed  set  B from  Rm,e 
Borel  measurable  function  Q from  B to  R,a  determinis- 
tic environment  £ and  a distribution  function  G with 
support  In  B such  that  for  all  sequences  {®n}>{^n) 

and  (y  ] from  [0,1]  with  s »1  and  y “0  and  for 
n n n n 

all  sequences  [Xn)  of  positive  Integers  and  for  all 

algorithms  fitting  the  description  of  section  III, It  Is 
Impossible  that 

Max(Q[W*),qmin)  2 q^wpl. 

For  deterministic  environments , one  can  let  b =1 
in  the  conditions  of  Theorem  l.The  conditions  of 
convergence  then  reduce  to 

t • * I ) ’*. 

n»l,  n n=l. n 

By  a slight  change  In  the  proof  of  the  theorem.lt  can 

be  seen  that  the  condition  I 0 =»  can  be  dropped 

altogether.  n 

The  conditions  of  convergence  In  Theorem  1 

look  rather  complicated. Let  for  Instance  e =A/n®,3  = 

n n 

B/n®  and  X =Cn®  where  and  6*0. If  the  en- 

n 

vlronment  is  £{  with  ti2,then  (13-17)  hold-  If 

Maxtor , p)<Mln(6/2 , (J  (t-l)-l)/t  , 1)  (22) 

and  (13-14 , 1S'-17*)  hold  If 

Max(or,  8)<Mln(6/2,(6(t-l)-2)/t  , 1).  (23) 


If  the  environment  Is  J.then  (13-17)  or  (13-14,15'- 
17')  hold  If 

Maxtor,  6)<Mln(6/2,l)  . (24) 

For  this.lt  suffices  that  ® = P = 0 and  that  6>0.The 
proofs  of  the  sufficiency  of  (22-24)  are  given  In  the 
Appendix . 

V.  Conclusion 

The  theoretical  properties  of  a large  class 
of  random  search  algorithms  for  use  In  stochastic 
optimization  are  discussed. To  actually  obtain  practi- 
cal algorlthms.lt  is  Important  to  make  the  best  use 
of  the  freedom  that  Is  left  to  the  designer, e.g. In 

the  choice  of  the  sequences  [e»  },  (3  ] and  (x  },ln 

n n n 

the  procedure  for  the  generation  of  W*  and  In  the 
procedure  for  updating  the  memory  coAents  H .As 
for  most  random  search  techniques , the  class  nof 
random  environments  to  be  allowed  is  very  large. 
This  makes  the  algorithm  suitable  as  a basic  buil- 
ding block  for  a widely  applicable  optimization  pro- 
gram In  the  computer  library. 

The  designer  has  the  option  to  use  an  al- 
gorithm with  a growing  memory  to  reduce  the  cost 
of  optimization  measured  In  terms  of  the  number  of 
observatlons.lt  is  pointed  out  how  a growing  memory 
can  be  useful  even  In  exhaustive  (but  stochastic) 
optimization  problems. In  non-exhaustlve  search 
problems, e.g. when  B=Fm  and  Q is  continuous  .other 
procedures  to  extract  Information  from  the  past  ob- 
servations should  be  studied. For  instance, further 
research  Is  lncouraged  in  parametric  and  nonparame- 
trlc  estimators  of  Q that  use  the  data  that  are  col- 
lected during  the  search. 

VI.  Appendix 

Lemma  1 :Let  X,,...,X  ,X.' X'  be  lid  random 

1 n 1 2 — n 

variables  with  E[X,  ]=0  and  E[X,  }=o  <».If  S = L X, 
n 1 l n l-i 

and  S'»E  (X-X!)  .then  1 l* 

n 1=1,  1 1 

P[U  [|S  A|*«]}*6  EP(|S'  k-l/2IC|sc/8] 
kin  kilog^n 

for  all  n and  c>o  with  n*2>8o*. 

Proof:  Let  uY  denote  the  median  of  a random  variable 
Y.By  P. Levy's  symmetrlzatlon  Inequality  and  the 
fact  that  if  E{Y]=0,  then  | |*(2E[Y2))* 

Pi  U [|S  A|ic}}  <P{  U [|(S  -uS)A|it/2]} 
kin  kin  K * 

+ Pi  U {|u3A|i«/2]}* 
kin 

2P[  U [|S'A|ic/2]]+P{  ulttiAjHic/S)}. 
kin  kin 

The  last  term  on  the  right-hand  side  of  this  in- 
equality is  0 If  ncV4>2o^. Arguing  as  In  Loeve  [21, 
pp. 252-253] , we  have  for  2lc_1<n*2  , 

lSn/n  '=  I (Sn-S2  k-l)/n  + S’2k-l/n  I 

S's;-s2k-ii/n+  is;wi/2’t'1 

*2(|s;-s'k_i|A,t+  |s-M|/ak). 

By  another  application  of  Levy's  symmetrlzatlon  In- 
equallty,  u { |S!/J  |i  «/2  }} 

)in  J 


K4fc*‘ 


i»L  *.« 


r 


I 


2^ 


* e <p(  u kI  12  |s;-s*  |/2k* «/4)) 

,k  J=2*'M.  ' 2*1 

+ P(2|S’k.1l/2  »«/<}  ) 

2 . vauon  w 

* E(2P{|S'  -S' |/2Ki«/8}«.p{|3'  .|/2ki«*))  an  event 

. •>  = * „*-i 


Proof  of  theorem  1 :Let  «>0  be  arbitrary  and  let  [b  } 
be  a lequence  of  Integers  satisfying  (13-17). If  c =n 
n_b  +l,and  W is  a random  vector  with  distribution 
function  G.then  we  make  the  following  crucial  obser- 


vation where  we  use  1 


,k 
2 an 


= 3 EPUS-^l/2  i,/8). 


k*log2n 


{0(W)>q 

n min 


u{n 


{.} 
♦ « ) C I 


to  denote  the  indicator  of 
n 

r Ir 


{2=0} 


= 0} 


n 


n 


Lemma  2 :Let  be  lid  random  variables  with 

E{Xj}*0  and  E £ |x^  | 1 } ■>  for  some  ti2,then 
P(  utlSjA^JJsC/s^.C.e-^"*2 


kan  2 2/t 

for  all  n and  *>0  with  n*  >8L  ' 


♦ C2o 

where 


C1=24  (l+2/t)tL8t,C2=12/(l-exp(-16e2))  and  Cj- 
1/(32  e‘(2+t) L2/t). 

Proof  : From  lemma  1 ,an  inequality  of  Fuk  and  Na- 
gaev  [22,pp.654]  and  E[X2)=c2‘L2/l , we  have  with 

1 . V V V * 


S’=X.  + . . ,+X  -X  . 
n 1 n n+ 1 


2n 


P[  U(|S.  |Aa«}}  * 6 2 P(  | S' . |/2*i  »/>} 

kin  kilog2n  2* 

*6  z 2(l+2/t)tL/((*/6)t(2*"X)k) 


kilog2n 


+ 2 exp(-2  et2*C(t/fl)2/(t+2)cr2 ) 


P[S  /mi)  *P { E X in*  }‘e'xn*(E{exXl])n 
1=1. 

s exp(-\n*+nx2o2/2(l- |x|L))  for  all  |x|L<l{ 

With  xL=*L/o2(l+-Lc/o2)  ,we  obtain 

P[S  /n*  c } s exp(-nt2/2o2(l+cL/<j2)). 


tL/o*)) . 
i‘-c  ) . 


so  that  by 


The  samenbound  Is  valid  for  pis  /n 

a combination  of  bounds,  n 

P{|S  /n  | i c J *2  exp(-nc2/2(o2+Lt)) . 

0 7 2 

By  lemma  l.for  all  n with  n«  >8a  , 

P{  U{|S  A |»<]]«  12  E exp(-2k(«/8)2/2.(o2+L*/®)) 
kan  kalog2n 

‘12  exp(^(*8)2/2(<,2+L«/9))/(l-e"n(*/8)  Alp 

* C4  exp(-nt2/(128o2+16Lc)). 


[2  = 0lil]nl  U ( |Y*-Q(W*)  |>  «/4c  ) 
i=*>  J l-b  , ‘ n 

n n 

U {n  [0 (W *) > q m , ft+  «/2 ) 1 }. 

1=® 

a 

First.it  is  clear  that 

n n 


(25) 


Also. 


* exP(-  E 8.).  (26) 

*V  1 1=bn.  l=ba‘ 

P i s» ov 1 >w </*  1 1 «PU >(V, , 4 • n i 


n 


+p{LI{2=n>  E V2:n  {0(wi*)>qmi„+«/2U 

i=6  lti  1 J i=b  ‘ 1=*  1 mln 

«>*  n,  n. 


i=* 

n, 
n 

E d/2 


*(C  /2ctnH)  2 (2t'1)'k+C2(l-exp(-l/16e2)). 

k”°'  E exp(-CL«22k) 

kilog.n 

‘ C /«tnt‘1+(e'Q3n*/(l-e*C3n'  ))C2(l-e'1/Ue2) 

. ‘C1/.tnt-1+C2e-<W  2 

for  all  n^wlth  nt2>8L2/'t  in  view  of  e_<^nc  *e-<^82, 
se  1^l6e  ,we  used  the  fact  that  for  all  a>l,b>l  and 

v>k  UK  -hk 

K Integer,  E a~D  sa-b  /(1-a  D ). 
kiK 

Lemma  3 :Let  X^,...,Xn  be  lid  random  variables  with 
E[X  ]=0  and 

Ele^1]  *eX  c /2(1-,x!L)for  ail  x with  |x|L<l 
and  for  some  oiO  and  Lao, then 

p{  U (|SkA|at}  }*C  exp(-nc2/a28<J2+16L«)) 

)c  2 n 9 9 

for  all  n and  t>0  with  ne  >8  a .where  C4» 

12/(1  -exp(- 1/16  (l+Le/8o2))  ).  , 

Proof  :It  is  easy  to  see  that  E[Xj}=e2  .Note  also  that 
for  all  n,by  Chebyshev’s  Inequality, 


‘(P[0(W)>q  +,/2])1=,bn' 

"J'tP  n 


n. 


Using  Bennett's  inequality  (see.e.g.  [22])  and  the 
fact  that  by  the  definition  of  1mln.P{Q(W)>q  ♦ J2 } 

= 1-9  for  some  8>o,the  right  hand  side  of  the  last 
lneduallty  is  upper  bounded  by 
n 


(1-9) 


/2 


n* 


n 2 , n 

+ exp(-c  ( e »/2c  ) /(2oi  E »./2c  J 
l=b  , i*  , * 

where  2 n n n " 

° = E « (1-ar  )/c  < E or./c  . 

l=b  ‘ 1 n i=b  n 

n n 

Therefore, we  can  conclude  that 

P{inb[Q(Wl*,>qmin+«/2]J 

n . n n 

‘ exp(-(a/2)  E « ) + exp(-  E « /10  ) . (27) 

1H>  . i=b  . 

Next,  n n n 

P{U  { |Y*-Q(W*)  |> */4c  }} 

i=b  , 1 1 n 

n n . 

*P{U  U {|Y(W*  i)-Q(W*)|>t/4o  )] 

1=1,  Jt=X.  , 1 n 

bn 

sn  sup  P[  " { |Y(w,  jt)-Q(w)  |>t/4c  )} 

WfB  i=x.  , n 

X bn 

where  Y(w , 1)  =2  Y/x  and  the  Y lilsn,  are  lid  ran- 

1=1. 

dom  variables  with  distribution  function  F . Note  that 
we  used  the  fact  that  the  merging  in  (9) w is  consis- 
tently used  and  that  for  all  n.the  W and  v/’.lsi^r  , 
are  pairwise  unequal.  n 1 n 

From  lemmas  2 and  3 we  know  that  for  all  n lar- 
ge enough  m 

sup  P { U { |Y(w , X)-Q(w)  |>  c/4c  ))  s g 


w(  B 
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where 


g 0 If  the  environment  Is  £ 
n K^c^/X^  * + *Se  ^^n^n  If  the  environ- 
ment's e.  (tz2)  and  r /c2iK. 

t t>_  n 4 

_ ? n 

1^  exp(-KgXb  /<1Vcn+,^cn)  lf  the  environ- 
ment is  J and  Xb  /c,^  * - 

and  where  Kj 1^  are  positive  constants  that  de- 

pend upon  t and  the  parameters  of  the  environment, 
e.g.L  and  t if  t Is  6 with  parameter  L ,or  L and  c2 
If  the  environment  Is  J.Let  d=Mln(I/IO ; 9/2)  so 
that  .after  collecting  bounds  and  resubstitution  In 
(25), we  obtain  .for  all  n large  enough, 

P{Q(W)>q  +*]*  exp(-  E 0 )+2exp(-dE  a.) 

1=*>  , 1=*  ; 
n n 

+ ngn  • (28) 

Clearly, (28)  and  (13-17)  Imply  (18). The  second  part 
of  the  theorem  follows  from  (28) , (1 5*— 1 7')  and  the 
Borel-Cantelll  lemma  [21], Indeed, It  Is  easy  to  check 
that  for  all  t>0, 

I  P{Q(Wn»q  +.}  <. 
n=l , 

by  a repeated  use  of  the  fact  that  for  any  sequence 
[a^]  of  nonnegative  real  numbers  and  any  r>0, 

a^/log n 2 •>  If  and  only  If  z nre-°n<  - .This  conclu- 
des the  proof  of  theorem  1 .n”2  * 

Proof  of  theorem  3 . Let  B={w  ,w  ),let  Q(w  )=*0,Q(w) 

= 1 and  let  the  environment  be0 deterministic.0 Let  G 

put  mass  1/2  each  at  w and  w, . Let  y =0  for  all  n 
• U l n 

and  consider  the  algorithm  described  In  section  III 
with  x =i  (this  Is  without  loss  of  generality  since 
the  environment  is  deterministic)  .Theorem  3 Is  proved 
If  we  can  show  that 

(I)  If  E arn=»  .then  P{  U {Q(Wfc*)> J) J»i  foralln 

n*l,  kxn 

(II)  If  z or  <«,then  P(Q(W  )>iliiexp(-  z a ) 

_ , n n , n 

n=l,  n=l, 

for  all  n. 

Fgr  (l),wlth  arbitrary  n,we  argue  as  follows . Since 
E a^= » for  all  n, 

k=n'  Pf  U{Q(W*)>*}}  i P{  u {Zk=l,W*=w  J] 
kan  kin 

= i-p{  n {{Zk=o]u{zk«i,wk*-w  i]} 
kin 

= l-TT (p{zk=0)+p(zk=i}/2)  i i-TTOfc+o^/2) 

kin  kin 

= l-TT  (l-<V/2)  *1  -exp(-  e ol/2)  - I. 
kin  * k=n. 

If  E or  “0,  then,  with  probability  bne,W  =W  =W  =. . . 

ml,  " 012 

so  that  for  all  n,P (Q(Wn)>  J ) »P {0(WQ)> J)=J.Next ,let 

Z »n=  A<».Then, 

n=1*  P[O(Wn)>i}JrP{O(W0)>|}.P{  n{Q(W*)=l}} 

**P{  n {{Z  -l.w^w  }U(Z  -0))} 
n lc”1, 

= *TTW2)  * iTf<i-«t/2) 

k=l,  k=l. 


n n _ 

1 i O'  exp(-cx /(2-o.  ))  i i TT  e * i J© 
k»i,  K K k=l, 

where  we  used  the  Inequality  1 -uiexp(-u/(l -u))  for 
0‘u<l.This  proves  (li).Next, 

P {Q (W* ) > i } i 8nP  {0 (Wfrl ) > * } i 0n  e"A/2 

and  lim^lnf  P{Q(W*)>$]  ie"A/2 

because  the  summabillty  of  the  a Implies  that  a 2o 
n ri  n 

and  0 -*  1 . This  and  (1)  show  that  It  is  Impossible 

that  nQ(W*)Uq  wpl. 
n min 

Proof  of  the  sufficiency  of  (22-24)  :We  display  a se- 
quence (c  ) with  c =n-b  +1  for  all  n such  that 
n n n 

(13-17)  or  (13-14, 1 5 ' — 1 7 •)  hold. Let  c ~nY  for  some 
0<y<l .Because  B<1  and  o<l  we  have  that 

n Y~  Or  n v-0 

E or.  ~ n1  and  E 0.  ~ n . 

t=bn‘ 

All°  Xb  ^cnlogn  ~"S  2v/logn 

“nd  „CV  t_1 

n c / x.  ^ n • 

n 

If  (22)  holds, then  we  can  find  a 0<y<l  such  that 
(13-17)  Is  satisfied  for  type  e environments  with 
ti2. Similarly, (24)  Is  sufficient1  for  (13-17)  and  for 
(lS'-17')for  type  j environments. Finally, (23)  Is  suf- 
ficient for  (13—14, 15*— 17')  for  type  environments. 


1 G.J .McMURTRY:" Adaptive  optimization  procedures.", 
in: "Adaptive, Learning  And  Pattern  Recognition  Sys- 
tems. " ,J .M  .Mendel  and  K. S.Fu, Eds. , Ac. Press, 
N.Y.  ,1970 

2 R.A.JARVIS:"Optlmlzatlon  strategies  in  adaptive 
control:  a selective  survey .", IEEE, Trans . on  Syst., 
Man  and  Cybernetics, vol.SMC-5, No. l,pp. 83-94, 
1975 

3 L.D. COCKRELL, K.S.FU:"On  search  techniques  In 
adaptive  systems.”  ,Techn.Rept.TR-EE- 70-1  .Purdue 
Univ.  .Lafayette, Ind.  ,1970 

4 L.S.GURIN:Random  search  In  the  presence  of 
noise."  .Engineering  Cybernetics,  vol. 4, No. 3, pp. 
252-260,1966 

5 L.P.DEVROYE:”On  the  convergence  of  statistical 
search. ",  IEEE,  Trans,  on  Syst., Man  and  Cyberne- 
tics, vol.  SMC-6,  No.  l,pp.  46-56, 19  76 

6 I. J. SHAPIRO, K.S.NARENDRA:" The  use  of  stochastic 
automata  for  parameter  self-optimization  with  mul- 
timodal performance  criteria.", IEEE, Trans. on  Syst., 
Sci . and  Cybernetics , vol . SSC-S , No . 4 , 1969 

7 R . VISWAN ATHAN  , K . S . NARENDRA: " Stochastic  auto- 
mata models  with  applications  to  learning  systems", 
IEEE, Trans. on  Syst., Man  and  Cybernetics , vol . 
SMC-3 . No,  1 , pp.  1 07-1 11,1973 

8 L. P.DEVROYE:"Probabillstlc  search  as  a strategy 
selection  procedure.  ",  IEEE,  Trans . on  Syst. .Man 
and  Cybernetics, vol. SMC-6, No. 4, pp. 3 15-321, 1976 

9 G. J. McMURTRY, K. S. FU:"A  variable  structure  auto- 
maton used  as  a multimodal  searching  technique. ", 
IEEE, Trans , on  Automatic  Control, vol. AC-11, No. 3, 
pp. 379-387, 1966 

10E. KIEFER, I. WOLFOWITZ:"Stochastlc  estimation  of 
the  maximum  of  a regression  function. ".Ann. Math. 
Stat.  , vol. 23, No. 3, pp. 462-466, 1952 

11  H.J.KUSHNER: "Stochastic  approximation  algorithms 
for  the  local  optimization  of  functions  with  non- 


m 


r ’ 


unique  stationary  points IEEE,  Trans  .Automat. 
Coi.tr.  , vol. AC-17, No. 5, pp. 646-654,1952 

12  Ya.Z.TSYPKIN: "Smoothed  randomized  functionals  and 
algorithms  In  adaptation  and  learning  theory, ".Au- 
tomat. Remote  Contr.  , vol . 32, No. 8, pp. 119 0-12 00, 
1971 

13  G.R.GUCKER:"Stochastic  gradient  algorithms  for 
searching  multidimensional  multimodal  surfaces.", 
Techn.Rept.  TR-6778-7 .Stanford  Untv ., Cal .,  1969 

14  B . T . POLYAK , Ya . Z . TSYPKIN : “ Ps eudogradlent  adapta- 
tion and  training  algorithms ."  .Automat. Remote 
Contr.  , vol. 34  .No.  3 ,pp.  377-397 , 1973 

15  V . Ya . KATKOVN  IK . O . Yu . K UL  'CH ITSKII : " C on  vergence 
of  a class  of  random  search  algorithms.  ".Automat. 
Remote  Contr. , vol  .33 , No. 8 , pp.  1321-1326 , 1972 

16  E.  M.  VAYSBORD.D.  B. YUDIN:"Multlextremal  stochas- 
tic approximation.  " .Engineering  Cybernetics,  vol. 6, 
No. 5, pp.1-11, 1968 

17  J ,D.HILL:"A  search  technique  for  multimodal  surfa- 
ces.",  IEEE,  Trans . on  Syst. , Sci.and  Cybernetics,, 
vol.  SSC-5,  No.  l,pp.  1-1 1,1969 

18  A.N  . MUCCIARDI:"A  new  class  of  search  algorithms 
for  adaptive  computation. " ,ln:Proceedlngs  of  the 
Conf.on  Decision  and  Control, San  Diego, Cal.  ,pp. 
94-100,1973 

19  R.W,MACLAREN:"A  continuous- valued  learning  con- 
troller for  the  global  optimization  of  stochastic 
control  systems. ",  in:  Pattern  Recognition  And  Ma- 
chine Learning .K.S.Fu, Ed.  .Plenum  Press ,N .Y.  ,pp. 
263-276,1971 

20  J.  MATYAS:  "Random  optimization.  " .Automat. Remote 
Contr. , vol. 26, No. 2, pp. 244-25 1,1965 

21  M.LOEVE:" Probability  Theory. ",3d  ed.,Van  Nos-, 
trand, Princeton, N.J. , 1968 

22D.K.FUK,S.V.NAGAEV;"Probablllty  Inequalities  for 
sums  of  Independent  random  variables ."  .Theory  of 
Probab.and  Its  Appllc.,vol.l6,No.4,pp.643-660, 
1971 

VIII.  Acknowledgement 

This  work  was  supported  by  the  Air  Force  under  grant 

AFOSR  72-2371. 


